Skip to content

TEST: include icechunks zarr tree strategies here#3914

Open
ianhi wants to merge 4 commits intozarr-developers:mainfrom
ianhi:ian/tree-strats
Open

TEST: include icechunks zarr tree strategies here#3914
ianhi wants to merge 4 commits intozarr-developers:mainfrom
ianhi:ian/tree-strats

Conversation

@ianhi
Copy link
Copy Markdown
Contributor

@ianhi ianhi commented Apr 16, 2026

Up streaming our zarr tree hypothesis strategy from icechunk.

generates trees like these:

/
├── EtTWAYY/
│   ├── EtTWAYY.jEm  [array shape=(1,)]
│   ├── EtTWAYY/
│   │   ├── 7ek  [array shape=(1,)]
│   │   ├── g  [array shape=(1,)]
│   │   ├── 7H7_NaN  [array shape=(1,)]
│   │   └── NaN  [array shape=(1,)]
│   ├── NaN  [array shape=(1,)]
│   └── 91  [array shape=(1,)]
└── NaN/
    └── srtB  [array shape=(1,)]

Two commits

  • 398ecb4 adds the models and strategies - nearly a straight lift from icehcunk. added a bit more logic around materialize write mode, and zarr format + case sensitivity otherwise the same as in icechunk.
  • bb62db7 uses them in some tests - mostly as an example for anyone who wants to add more in the future.
  • Add unit tests and/or doctests in docstrings
  • Add docstrings and API docs for any new/modified user-facing classes and functions
  • New/modified features documented in docs/user-guide/*.md
  • Changes documented as a new file in changes/
  • GitHub Actions have all passed
  • Test coverage is 100% (Codecov passes)

ianhi added 2 commits April 16, 2026 16:49
…trategy

Introduces declarative tree descriptors in `zarr.testing.models` (ArrayNode,
GroupNode) with `materialize()` / `from_store()` round-trip helpers, and a
hypothesis `trees()` strategy in `zarr.testing.strategies` that generates
realistic zarr hierarchies with prefix-colliding names (sibling
`foo`/`foo-bar`, cousin name reuse).

`trees()` accepts `case_insensitive: bool | None` to keep sibling names from
differing only in letter case when targeting stores backed by a case-insensitive
filesystem (macOS APFS, Windows NTFS default). Defaults to the current
platform's filesystem behavior.

`materialize()` accepts an optional `zarr_format` to exercise v2 and v3 from
the same tree descriptor.
Two property tests exercising zarr hierarchy code paths against the new
trees() hypothesis strategy:

- test_groups_and_arrays_split_children — verifies that Group.groups() and
  Group.arrays() return exactly the sub-groups and arrays respectively,
  across random hierarchies with prefix-colliding sibling names. Uses a
  MemoryStore since the partition logic is store-agnostic; local/zip
  coverage is already carried by the sibling imperative test.
- test_group_members_tree_roundtrip — materialize() / from_store() round-trip
  for any GroupNode tree.

Each test pins a hand-picked shape via @example so the known-good case
runs alongside the random draws.
@github-actions github-actions bot added the needs release notes Automatically applied to PRs which haven't added release notes label Apr 16, 2026
@github-actions github-actions bot removed the needs release notes Automatically applied to PRs which haven't added release notes label Apr 16, 2026
@codecov
Copy link
Copy Markdown

codecov bot commented Apr 16, 2026

Codecov Report

❌ Patch coverage is 81.67939% with 24 lines in your changes missing coverage. Please review.
✅ Project coverage is 92.97%. Comparing base (dd5a321) to head (a8a2ca0).

Files with missing lines Patch % Lines
src/zarr/testing/models.py 72.50% 22 Missing ⚠️
src/zarr/testing/strategies.py 96.07% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3914      +/-   ##
==========================================
- Coverage   93.10%   92.97%   -0.14%     
==========================================
  Files          85       86       +1     
  Lines       11193    11324     +131     
==========================================
+ Hits        10421    10528     +107     
- Misses        772      796      +24     
Files with missing lines Coverage Δ
src/zarr/testing/strategies.py 93.41% <96.07%> (+0.50%) ⬆️
src/zarr/testing/models.py 72.50% <72.50%> (ø)
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

class GroupNode:
children: dict[str, ArrayNode | GroupNode] = field(default_factory=dict)

def walk(self, prefix: str = "") -> Iterator[tuple[str, Node]]:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i feel like "iter_nodes"or similar is a bit more expressive than "walk"

@d-v-b
Copy link
Copy Markdown
Contributor

d-v-b commented Apr 17, 2026

Thanks for this! I feel like if these array / group hierarchy methods are useful for testing, then they are just useful, and we should consider integrating them into the codebase. With that in mind I'm gonna leave some comments but they are just advisory, feel free to ignore.

if isinstance(child, GroupNode):
yield from child.walk(p)

def nodes(self, prefix: str = "", *, include_root: bool = False) -> list[str]:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all these iteration routines can do IO. so returning list[stuff] requires doing all the IO before the first value is available. Iterator[stuff] or Generator[stuff] gives us a bit more flexibility if we want to stream results instead of doing all the IO up-front.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(this applies in a world where we put these routines on the group class that's bound to a store)



@dataclass(frozen=True)
class GroupNode:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this approach to "structurally model a group" reminds me of GroupSpec in pydantic-zarr. this is clearly a useful representation, so we should figure out how we can fit it into zarr-python proper


GroupNode.from_store(some_memory_store)
"""
return sync(cls.from_store_async(store))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

using sync here forces any consumer to use the zarr-python event loop. flagging this for visibility

Comment thread tests/test_group.py
zarr_format=st.sampled_from([2, 3]),
)
@pytest.mark.filterwarnings("ignore:Consolidated metadata:zarr.errors.ZarrUserWarning")
def test_groups_and_arrays_split_children(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

was there bugged behavior that this test was necessary to fix?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants